Overview

Dataset statistics

Number of variables12
Number of observations2985
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory280.0 KiB
Average record size in memory96.0 B

Variable types

Numeric12

Warnings

gross_revenue is highly correlated with invoice_no and 1 other fieldsHigh correlation
invoice_no is highly correlated with gross_revenue and 1 other fieldsHigh correlation
quantity is highly correlated with gross_revenue and 1 other fieldsHigh correlation
avg_ticket is highly correlated with qtde_returns and 1 other fieldsHigh correlation
qtde_returns is highly correlated with avg_ticket and 1 other fieldsHigh correlation
avg_basket_size is highly correlated with avg_ticket and 1 other fieldsHigh correlation
gross_revenue is highly correlated with invoice_no and 2 other fieldsHigh correlation
recency_days is highly correlated with invoice_noHigh correlation
invoice_no is highly correlated with gross_revenue and 2 other fieldsHigh correlation
quantity is highly correlated with gross_revenue and 2 other fieldsHigh correlation
avg_ticket is highly correlated with avg_unique_basket_sizeHigh correlation
avg_recency_days is highly correlated with frequencyHigh correlation
frequency is highly correlated with avg_recency_daysHigh correlation
avg_basket_size is highly correlated with gross_revenue and 1 other fieldsHigh correlation
avg_unique_basket_size is highly correlated with avg_ticketHigh correlation
gross_revenue is highly correlated with invoice_no and 1 other fieldsHigh correlation
invoice_no is highly correlated with gross_revenue and 1 other fieldsHigh correlation
quantity is highly correlated with gross_revenue and 1 other fieldsHigh correlation
avg_recency_days is highly correlated with frequencyHigh correlation
frequency is highly correlated with avg_recency_daysHigh correlation
gross_revenue is highly correlated with quantity and 4 other fieldsHigh correlation
quantity is highly correlated with gross_revenue and 1 other fieldsHigh correlation
avg_ticket is highly correlated with gross_revenue and 2 other fieldsHigh correlation
qtde_returns is highly correlated with gross_revenue and 2 other fieldsHigh correlation
avg_basket_size is highly correlated with gross_revenue and 2 other fieldsHigh correlation
invoice_no is highly correlated with gross_revenue and 1 other fieldsHigh correlation
avg_ticket is highly skewed (γ1 = 53.71859111) Skewed
qtde_returns is highly skewed (γ1 = 51.85337872) Skewed
avg_basket_size is highly skewed (γ1 = 45.92326414) Skewed
df_index has unique values Unique
customer_id has unique values Unique
recency_days has 34 (1.1%) zeros Zeros
qtde_returns has 1453 (48.7%) zeros Zeros

Reproduction

Analysis started2021-05-20 22:16:56.302069
Analysis finished2021-05-20 22:17:12.800178
Duration16.5 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct2985
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2375.015075
Minimum0
Maximum5896
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2021-05-20T19:17:12.883139image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile186.2
Q1947
median2163
Q33617
95-th percentile5207.2
Maximum5896
Range5896
Interquartile range (IQR)2670

Descriptive statistics

Standard deviation1604.366167
Coefficient of variation (CV)0.6755183087
Kurtosis-0.9931192818
Mean2375.015075
Median Absolute Deviation (MAD)1302
Skewness0.3594335959
Sum7089420
Variance2573990.798
MonotonicityStrictly increasing
2021-05-20T19:17:12.988183image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
27021
 
< 0.1%
6411
 
< 0.1%
52361
 
< 0.1%
50581
 
< 0.1%
26941
 
< 0.1%
6471
 
< 0.1%
26961
 
< 0.1%
6491
 
< 0.1%
6511
 
< 0.1%
Other values (2975)2975
99.7%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
58961
< 0.1%
58771
< 0.1%
58671
< 0.1%
58611
< 0.1%
58401
< 0.1%
58361
< 0.1%
58301
< 0.1%
58191
< 0.1%
58181
< 0.1%
58081
< 0.1%

customer_id
Real number (ℝ≥0)

UNIQUE

Distinct2985
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15270.2201
Minimum12347
Maximum18287
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2021-05-20T19:17:13.097142image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum12347
5-th percentile12615.2
Q113792
median15223
Q316771
95-th percentile17964.8
Maximum18287
Range5940
Interquartile range (IQR)2979

Descriptive statistics

Standard deviation1721.133195
Coefficient of variation (CV)0.1127117477
Kurtosis-1.208464419
Mean15270.2201
Median Absolute Deviation (MAD)1490
Skewness0.02916556856
Sum45581607
Variance2962299.476
MonotonicityNot monotonic
2021-05-20T19:17:13.207158image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
163841
 
< 0.1%
129871
 
< 0.1%
149841
 
< 0.1%
170331
 
< 0.1%
137041
 
< 0.1%
129391
 
< 0.1%
170371
 
< 0.1%
141251
 
< 0.1%
133631
 
< 0.1%
181641
 
< 0.1%
Other values (2975)2975
99.7%
ValueCountFrequency (%)
123471
< 0.1%
123481
< 0.1%
123521
< 0.1%
123561
< 0.1%
123581
< 0.1%
123591
< 0.1%
123601
< 0.1%
123621
< 0.1%
123641
< 0.1%
123701
< 0.1%
ValueCountFrequency (%)
182871
< 0.1%
182831
< 0.1%
182821
< 0.1%
182771
< 0.1%
182761
< 0.1%
182741
< 0.1%
182731
< 0.1%
182721
< 0.1%
182701
< 0.1%
182691
< 0.1%

gross_revenue
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2979
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2782.437548
Minimum6.2
Maximum280206.02
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2021-05-20T19:17:13.315628image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum6.2
5-th percentile229.62
Q1573.22
median1098.43
Q32330.92
95-th percentile7320.916
Maximum280206.02
Range280199.82
Interquartile range (IQR)1757.7

Descriptive statistics

Standard deviation10635.51096
Coefficient of variation (CV)3.822371853
Kurtosis348.5532904
Mean2782.437548
Median Absolute Deviation (MAD)682.35
Skewness16.62512019
Sum8305576.08
Variance113114093.5
MonotonicityNot monotonic
2021-05-20T19:17:13.414677image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
632
 
0.1%
734.942
 
0.1%
379.652
 
0.1%
745.062
 
0.1%
3312
 
0.1%
731.92
 
0.1%
26879.041
 
< 0.1%
284.461
 
< 0.1%
610.521
 
< 0.1%
605.121
 
< 0.1%
Other values (2969)2969
99.5%
ValueCountFrequency (%)
6.21
< 0.1%
6.91
< 0.1%
13.31
< 0.1%
151
< 0.1%
36.561
< 0.1%
521
< 0.1%
52.21
< 0.1%
52.21
< 0.1%
62.431
< 0.1%
632
0.1%
ValueCountFrequency (%)
280206.021
< 0.1%
259657.31
< 0.1%
194550.791
< 0.1%
168472.51
< 0.1%
143825.061
< 0.1%
124914.531
< 0.1%
117379.631
< 0.1%
91062.381
< 0.1%
81024.841
< 0.1%
66653.561
< 0.1%

recency_days
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct272
Distinct (%)9.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean64.47303183
Minimum0
Maximum373
Zeros34
Zeros (%)1.1%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2021-05-20T19:17:13.521668image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q111
median32
Q382
95-th percentile242
Maximum373
Range373
Interquartile range (IQR)71

Descriptive statistics

Standard deviation77.76614786
Coefficient of variation (CV)1.206181029
Kurtosis2.750197
Mean64.47303183
Median Absolute Deviation (MAD)26
Skewness1.790720645
Sum192452
Variance6047.573753
MonotonicityNot monotonic
2021-05-20T19:17:13.626671image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
199
 
3.3%
487
 
2.9%
385
 
2.8%
285
 
2.8%
876
 
2.5%
1067
 
2.2%
967
 
2.2%
766
 
2.2%
1764
 
2.1%
2255
 
1.8%
Other values (262)2234
74.8%
ValueCountFrequency (%)
034
 
1.1%
199
3.3%
285
2.8%
385
2.8%
487
2.9%
543
1.4%
766
2.2%
876
2.5%
967
2.2%
1067
2.2%
ValueCountFrequency (%)
3732
0.1%
3724
0.1%
3711
 
< 0.1%
3681
 
< 0.1%
3664
0.1%
3653
0.1%
3641
 
< 0.1%
3601
 
< 0.1%
3591
 
< 0.1%
3584
0.1%

invoice_no
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct59
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.734338358
Minimum1
Maximum209
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2021-05-20T19:17:13.739679image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile17
Maximum209
Range208
Interquartile range (IQR)4

Descriptive statistics

Standard deviation8.90180138
Coefficient of variation (CV)1.552367653
Kurtosis194.6807795
Mean5.734338358
Median Absolute Deviation (MAD)2
Skewness10.86872714
Sum17117
Variance79.2420678
MonotonicityNot monotonic
2021-05-20T19:17:13.851629image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2788
26.4%
3505
16.9%
4385
12.9%
5242
 
8.1%
1193
 
6.5%
6172
 
5.8%
7143
 
4.8%
898
 
3.3%
968
 
2.3%
1054
 
1.8%
Other values (49)337
11.3%
ValueCountFrequency (%)
1193
 
6.5%
2788
26.4%
3505
16.9%
4385
12.9%
5242
 
8.1%
6172
 
5.8%
7143
 
4.8%
898
 
3.3%
968
 
2.3%
1054
 
1.8%
ValueCountFrequency (%)
2091
< 0.1%
2011
< 0.1%
1241
< 0.1%
971
< 0.1%
931
< 0.1%
911
< 0.1%
861
< 0.1%
731
< 0.1%
631
< 0.1%
621
< 0.1%

quantity
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct49
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.67470687
Minimum1
Maximum102
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2021-05-20T19:17:13.961671image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q18
median11
Q314
95-th percentile22
Maximum102
Range101
Interquartile range (IQR)6

Descriptive statistics

Standard deviation6.298728019
Coefficient of variation (CV)0.539519158
Kurtosis25.00800929
Mean11.67470687
Median Absolute Deviation (MAD)3
Skewness3.080622517
Sum34849
Variance39.67397465
MonotonicityNot monotonic
2021-05-20T19:17:14.075632image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
10295
 
9.9%
9260
 
8.7%
11254
 
8.5%
12220
 
7.4%
8214
 
7.2%
7211
 
7.1%
13203
 
6.8%
14167
 
5.6%
6153
 
5.1%
15140
 
4.7%
Other values (39)868
29.1%
ValueCountFrequency (%)
120
 
0.7%
232
 
1.1%
359
 
2.0%
482
 
2.7%
5107
 
3.6%
6153
5.1%
7211
7.1%
8214
7.2%
9260
8.7%
10295
9.9%
ValueCountFrequency (%)
1021
 
< 0.1%
741
 
< 0.1%
601
 
< 0.1%
591
 
< 0.1%
581
 
< 0.1%
561
 
< 0.1%
541
 
< 0.1%
502
0.1%
493
0.1%
444
0.1%

avg_ticket
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct2984
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51.74349696
Minimum2.150588235
Maximum56157.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2021-05-20T19:17:14.182682image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum2.150588235
5-th percentile4.919743966
Q113.1605
median18.18108696
Q325.31482759
95-th percentile90.72201312
Maximum56157.5
Range56155.34941
Interquartile range (IQR)12.15432759

Descriptive statistics

Standard deviation1033.269312
Coefficient of variation (CV)19.96906612
Kurtosis2916.221951
Mean51.74349696
Median Absolute Deviation (MAD)6.130240562
Skewness53.71859111
Sum154454.3384
Variance1067645.471
MonotonicityNot monotonic
2021-05-20T19:17:14.285630image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
16.833333332
 
0.1%
17.492758621
 
< 0.1%
9.4182926831
 
< 0.1%
5.011739131
 
< 0.1%
18.822906981
 
< 0.1%
28.899687941
 
< 0.1%
46.074130431
 
< 0.1%
25.775384621
 
< 0.1%
8.7451724141
 
< 0.1%
18.150615381
 
< 0.1%
Other values (2974)2974
99.6%
ValueCountFrequency (%)
2.1505882351
< 0.1%
2.43251
< 0.1%
2.4623711341
< 0.1%
2.5048760331
< 0.1%
2.508371561
< 0.1%
2.651
< 0.1%
2.6569318181
< 0.1%
2.7075982531
< 0.1%
2.7606215721
< 0.1%
2.7710052911
< 0.1%
ValueCountFrequency (%)
56157.51
< 0.1%
4453.431
< 0.1%
2027.861
< 0.1%
1687.21
< 0.1%
952.98751
< 0.1%
931.51
< 0.1%
872.131
< 0.1%
835.8641
< 0.1%
643.85857141
< 0.1%
6401
< 0.1%

avg_recency_days
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct1255
Distinct (%)42.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean66.90582076
Minimum1
Maximum366
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2021-05-20T19:17:14.393665image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile7.407171717
Q125.28571429
median47.66666667
Q385
95-th percentile201
Maximum366
Range365
Interquartile range (IQR)59.71428571

Descriptive statistics

Standard deviation63.47649879
Coefficient of variation (CV)0.9487440416
Kurtosis4.941588072
Mean66.90582076
Median Absolute Deviation (MAD)26.12121212
Skewness2.073697691
Sum199713.875
Variance4029.265899
MonotonicityNot monotonic
2021-05-20T19:17:14.499673image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1424
 
0.8%
423
 
0.8%
7022
 
0.7%
720
 
0.7%
119
 
0.6%
3519
 
0.6%
2118
 
0.6%
1118
 
0.6%
4618
 
0.6%
4918
 
0.6%
Other values (1245)2786
93.3%
ValueCountFrequency (%)
119
0.6%
1.51
 
< 0.1%
214
0.5%
2.51
 
< 0.1%
2.5655172411
 
< 0.1%
315
0.5%
3.2719298251
 
< 0.1%
3.3214285711
 
< 0.1%
3.52
 
0.1%
423
0.8%
ValueCountFrequency (%)
3661
 
< 0.1%
3651
 
< 0.1%
3631
 
< 0.1%
3621
 
< 0.1%
3572
0.1%
3561
 
< 0.1%
3552
0.1%
3521
 
< 0.1%
3512
0.1%
3503
0.1%

frequency
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct1355
Distinct (%)45.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.06544749662
Minimum0.005449591281
Maximum4
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2021-05-20T19:17:14.609676image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0.005449591281
5-th percentile0.009529875645
Q10.01792114695
median0.0297029703
Q30.05660377358
95-th percentile0.2222222222
Maximum4
Range3.994550409
Interquartile range (IQR)0.03868262663

Descriptive statistics

Standard deviation0.1464278785
Coefficient of variation (CV)2.237333528
Kurtosis207.4452392
Mean0.06544749662
Median Absolute Deviation (MAD)0.01466537631
Skewness10.83265063
Sum195.3607774
Variance0.02144112361
MonotonicityNot monotonic
2021-05-20T19:17:14.720668image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0277777777821
 
0.7%
0.166666666721
 
0.7%
0.333333333320
 
0.7%
0.0909090909118
 
0.6%
117
 
0.6%
0.417
 
0.6%
0.062516
 
0.5%
0.0238095238116
 
0.5%
0.0357142857115
 
0.5%
0.133333333315
 
0.5%
Other values (1345)2809
94.1%
ValueCountFrequency (%)
0.0054495912811
 
< 0.1%
0.0054644808741
 
< 0.1%
0.0054945054951
 
< 0.1%
0.0055096418731
 
< 0.1%
0.0055865921792
0.1%
0.0056022408961
 
< 0.1%
0.0056179775282
0.1%
0.005665722381
 
< 0.1%
0.0056818181822
0.1%
0.0056980056983
0.1%
ValueCountFrequency (%)
41
 
< 0.1%
21
 
< 0.1%
1.5714285711
 
< 0.1%
1.53
 
0.1%
117
0.6%
0.83333333331
 
< 0.1%
0.751
 
< 0.1%
0.666666666713
0.4%
0.66487935661
 
< 0.1%
0.61
 
< 0.1%

qtde_returns
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct217
Distinct (%)7.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean63.24020101
Minimum0
Maximum80995
Zeros1453
Zeros (%)48.7%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2021-05-20T19:17:14.836673image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q39
95-th percentile103.8
Maximum80995
Range80995
Interquartile range (IQR)9

Descriptive statistics

Standard deviation1509.254596
Coefficient of variation (CV)23.86543009
Kurtosis2774.251512
Mean63.24020101
Median Absolute Deviation (MAD)1
Skewness51.85337872
Sum188772
Variance2277849.436
MonotonicityNot monotonic
2021-05-20T19:17:14.947676image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01453
48.7%
1183
 
6.1%
2153
 
5.1%
3107
 
3.6%
490
 
3.0%
672
 
2.4%
564
 
2.1%
849
 
1.6%
1248
 
1.6%
747
 
1.6%
Other values (207)719
24.1%
ValueCountFrequency (%)
01453
48.7%
1183
 
6.1%
2153
 
5.1%
3107
 
3.6%
490
 
3.0%
564
 
2.1%
672
 
2.4%
747
 
1.6%
849
 
1.6%
938
 
1.3%
ValueCountFrequency (%)
809951
< 0.1%
90141
< 0.1%
80601
< 0.1%
46271
< 0.1%
37681
< 0.1%
33351
< 0.1%
29751
< 0.1%
20221
< 0.1%
20121
< 0.1%
19201
< 0.1%

avg_basket_size
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct1983
Distinct (%)66.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean246.3714787
Minimum1
Maximum40498.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2021-05-20T19:17:15.064672image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile44
Q1103
median172
Q3281
95-th percentile598.64
Maximum40498.5
Range40497.5
Interquartile range (IQR)178

Descriptive statistics

Standard deviation782.4497016
Coefficient of variation (CV)3.175894003
Kurtosis2350.161191
Mean246.3714787
Median Absolute Deviation (MAD)82.5
Skewness45.92326414
Sum735418.8639
Variance612227.5356
MonotonicityNot monotonic
2021-05-20T19:17:15.176676image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10011
 
0.4%
8210
 
0.3%
11410
 
0.3%
609
 
0.3%
1369
 
0.3%
738
 
0.3%
868
 
0.3%
1507
 
0.2%
647
 
0.2%
1307
 
0.2%
Other values (1973)2899
97.1%
ValueCountFrequency (%)
12
0.1%
1.51
< 0.1%
21
< 0.1%
3.3333333331
< 0.1%
5.3333333331
< 0.1%
5.6666666671
< 0.1%
6.1428571431
< 0.1%
7.51
< 0.1%
91
< 0.1%
9.51
< 0.1%
ValueCountFrequency (%)
40498.51
< 0.1%
6009.3333331
< 0.1%
3684.476191
< 0.1%
28801
< 0.1%
2697.4657531
< 0.1%
2183.21
< 0.1%
2160.3333331
< 0.1%
2141.51
< 0.1%
2082.2258061
< 0.1%
20001
< 0.1%

avg_unique_basket_size
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1015
Distinct (%)34.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.02564613
Minimum1
Maximum300.6470588
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2021-05-20T19:17:15.289672image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3.478947368
Q110
median17.22222222
Q327.71428571
95-th percentile56.71515152
Maximum300.6470588
Range299.6470588
Interquartile range (IQR)17.71428571

Descriptive statistics

Standard deviation18.94048787
Coefficient of variation (CV)0.8599288192
Kurtosis23.48302059
Mean22.02564613
Median Absolute Deviation (MAD)8.222222222
Skewness3.169981848
Sum65746.55369
Variance358.7420806
MonotonicityNot monotonic
2021-05-20T19:17:15.400630image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1354
 
1.8%
1444
 
1.5%
1135
 
1.2%
133
 
1.1%
932
 
1.1%
2031
 
1.0%
1729
 
1.0%
1029
 
1.0%
628
 
0.9%
1528
 
0.9%
Other values (1005)2642
88.5%
ValueCountFrequency (%)
133
1.1%
1.21
 
< 0.1%
1.251
 
< 0.1%
1.3333333332
 
0.1%
1.59
 
0.3%
1.5555555561
 
< 0.1%
1.5714285711
 
< 0.1%
1.6666666674
 
0.1%
1.8333333331
 
< 0.1%
1.91
 
< 0.1%
ValueCountFrequency (%)
300.64705881
< 0.1%
203.51
< 0.1%
1491
< 0.1%
145.33333331
< 0.1%
136.251
< 0.1%
135.751
< 0.1%
1271
< 0.1%
1221
< 0.1%
1181
< 0.1%
1141
< 0.1%

Interactions

2021-05-20T19:16:56.756053image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:56.863039image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:56.982041image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:57.078071image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:57.171082image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:57.259039image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:57.386041image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:57.488040image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:57.593076image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:57.707042image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:57.833041image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:57.958071image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:58.061080image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:58.158073image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:58.254077image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:58.352040image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:58.476052image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:58.595041image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:58.702068image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:58.807048image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:58.911078image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:59.019041image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:59.125057image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:59.281040image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:59.397043image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:59.493043image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:59.584080image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:59.675039image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:59.767039image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:59.853040image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:16:59.953081image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:00.072042image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:00.197041image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:00.301068image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:00.439043image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:00.543082image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:00.634085image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:00.739045image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:00.860041image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:01.132040image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:01.233040image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:01.382040image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:01.485040image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:01.583041image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:01.691039image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:01.817042image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:01.929082image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:02.039042image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:02.143068image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:02.236051image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:02.330042image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:02.422081image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:02.522054image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:02.613075image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:02.696083image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:02.782081image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:02.874088image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:02.963080image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:03.056080image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:03.151041image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:03.242085image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:03.336040image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:03.428039image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:03.523041image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:03.615078image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:03.700129image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:03.787043image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:03.875043image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:03.968043image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:04.063043image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:04.158085image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:04.252085image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:04.342085image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:04.431092image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:04.519086image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:04.608073image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:04.697082image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:04.781087image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:04.865085image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:04.951084image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:05.043086image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:05.291083image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:05.391089image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:05.492039image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:05.581040image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:05.676049image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:05.768039image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:05.861040image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:05.958039image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:06.049079image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:06.141086image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:06.234087image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:06.332068image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:06.430084image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:06.532077image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:06.632085image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:06.729089image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:06.825089image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:06.923088image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:07.019041image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:07.117085image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:07.209089image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:07.306087image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:07.402056image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:07.506079image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:07.609039image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:07.712089image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:07.817072image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:07.918043image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:08.019043image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:08.119120image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:08.219039image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:08.320040image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:08.415085image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:08.526085image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:08.624081image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:08.730039image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:08.841076image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:08.957087image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:09.061083image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:09.164080image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:09.266082image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:09.367086image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:09.465084image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:09.574087image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:09.673082image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:09.766081image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:09.863088image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:09.965084image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:10.066083image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:10.364082image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:10.472086image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:10.574088image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:10.671081image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:10.765086image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:10.858086image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:10.953091image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:11.045081image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:11.135040image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:11.227087image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:11.323087image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:11.421088image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:11.518042image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-20T19:17:11.618041image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-05-20T19:17:15.592713image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-05-20T19:17:16.040716image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-05-20T19:17:16.198711image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-05-20T19:17:16.371718image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-05-20T19:17:11.840375image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-05-20T19:17:12.482185image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexcustomer_idgross_revenuerecency_daysinvoice_noquantityavg_ticketavg_recency_daysfrequencyqtde_returnsavg_basket_sizeavg_unique_basket_size
00178505391.21372.034.06.018.15222235.5000000.48611140.050.9705888.735294
11130473237.5431.010.012.018.82290726.3076920.05247836.0139.10000017.200000
22125837281.382.015.025.029.47927121.8235290.04838751.0337.33333316.466667
3313748948.2595.05.08.033.86607192.6666670.0179210.087.8000005.600000
4415100876.00333.03.02.0292.0000008.6000000.13636422.026.6666671.000000
55152914668.3025.015.017.045.32330121.7500000.05730729.0140.2000006.866667
66146885630.877.021.024.017.21978618.3000000.073569399.0172.42857115.571429
77178095411.9116.012.023.088.71983632.4545450.04189942.0171.4166675.083333
881531160767.900.091.043.025.5434644.1444440.315508474.0419.71428626.142857
99145278508.822.055.015.08.7539305.8888890.23118340.037.98181817.672727

Last rows

df_indexcustomer_idgross_revenuerecency_daysinvoice_noquantityavg_ticketavg_recency_daysfrequencyqtde_returnsavg_basket_sizeavg_unique_basket_size
29755808177271060.2515.01.011.016.0643946.00.2857146.0645.00000066.000000
2976581817232421.522.02.010.011.70888912.00.1538460.0101.50000018.000000
2977581917468137.0010.02.02.027.4000004.00.4000000.058.0000002.500000
2978583013596697.045.02.010.04.1990367.00.2500000.0203.00000083.000000
29795836148931237.859.02.014.016.9568492.00.6666670.0399.50000036.500000
2980584012479527.2011.01.08.017.0064524.00.33333334.0385.00000031.000000
2981586114126706.137.03.06.047.0753333.01.00000050.0169.3333335.000000
29825867135211093.651.03.09.02.5083724.50.3000000.0245.333333145.333333
2983587715060303.098.04.08.02.5048761.02.0000000.065.75000030.250000
2984589612558269.967.01.05.024.5418186.00.285714196.0196.00000011.000000